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V ol postal addresses 



«vfl Data representing a postal address (30) is re- 

essor (2) searches the dictionary (4) for entnes (31) ex 
S coiesponding to the search terms and for entnas 



(32 a 32b 32c) allowing for the possibility of the input 

finds, by reference to a location .ndex (5)>e patched 

coded postal address elements 27a 27b ,28 ^ 

28c> corresponding to the entries (31 , 32a, 32b, 32c) in 

SSnary (4) 'determined 
corres P ondingtotheinputterms(8a,8b).^ep^sw 

„ the P n ~ ^i«SSKKli 
(9). 
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D crlption 



ab.e the method to be perto^ed fc 
[00 0 21 in Pa^ ;2 a ^er sys- 
providing an apparatus, ?^ X ^P ' resenting a mU lti- 
tem that operates on a database ^rep ^ 

pKcity of postal ^f^X^ngthecorrv 
postal addressfrom thedatebase by p ^ 

puter system with "£^g£ZJi Such sys- 
(refer redto in some ^T^aMme obtaining 
terns are known in the prior artana ^ 
M postal address deta,^ a j££-> 

fewer keystrokes bemgreq g Fu(ther . 

board operator to obta ' n j'"" P^mp^r system op- 

^■« thedat 2n^ 5S ITexanVpie, if the 
erates is accurate and up^ to authorit ies), 

in the relevant region). m is 

[00 03] A common «*«*™^ j£ details of an 
especially useful is when ac^^e ^ 

information can 'aadilyaccess tn ^ ^ 

es and find the correct address ° n J ne ssiblemis . 
fonmationgiveno^ 

interpretation might otherwise lea ^ benefft 

details being , n{or . 

when entenng address details « Rt read . 

m ation, which may be ^ art is a 

100041 ^SaSTS rSTct sold by us 
computer proved wrtn p The 
known as QuickAddress PRO ^-s t ^ data- 

software of that product waswrrtten for 
base of postal addresses '"^^d around the 

*? S ^oreseS a^ed in tie UK. In the UK a 
code system P'ese'W * a ? bout 15 addresses, 
postcode represente. on ave rage a 

Pr ° Vid 7 h fo;e S Sor onT^ a house number in or- 
require the operator onry 

derto obtain full deta,l ^„ 3 the P UK , postal codes 
[0005] ln countnes *he ^ t0 

(sometimes referred « a ^ ^ in some coun- 
many more than 6 addresses an y^ ^ 
tries cover more than onet °7 a data-structures 

usedinQuickAddress ^ V3 g most appropr , 



es relating to a country oth r Mr .the ^ 
P008, _A^-^JSrS r by us known as 



100061 ^KnSTSS « Known as 
PSiS ^ PRO World V rsion (V rsion 1) TJ 
QuickAddress use wjth a da ta- 



QuickAddress™ phu wo..u « - witha data 

. i The software used to searcn iui y 
Z « usTof a searching method known as 
dresses makes ^use .<» a re converted into a se- 

"pattem matching . ln P ulten ™ flre comP ared with a 
„ ries of three letter stnngs ^^JL with the 
store of all possible i three letter stringy 
posta, addree^ ' « 

tained within ^;^ B ™; ire a significant amount 

20 vide an improved method of JJTJP^ database rep . 

^ra^p.^o^Ssses.Thepresent 
resenting a mult W F f program 

invention also seeks to p*cwa ^ an 

25 improved method and to ^tapr k „ 
ing the following steps: 

.MP 

tebaSeS ' „ n , receivinq input data comprising 
' j * i« tho f««t database representing tne or 

SS=S53S==W= 



2 



EP 1 197 885 A2 



database determined by the processor in view of 
? h etfTrrnationascrtainedinstepf)asbe,n g .nac- 

cordanc with the input data, wherein 



the dictionary is intheform of atree datastructure 
having roTnode and terminating in a mult.pl.cty of 
eXes Cath from the root node to a leaf be.ng rep- 
reSve of an element of a postal address. 

Esaste^ 

a S Xet postal address (i.e. the postal address «- 
Ided to be retrieved by the input data) without the 
S for reding on a particuiar format of post^cod. 
Furthermore having adictionary arranged in such away 

Inner than the prior art method using three letter 

tS It wil. be understood that a leaf node general 
Presents a termination point within the tree, although 
nSn the present invention a termination po.n need 
rressarity be a "pure" leaf node (as is explained 

^e databases are pref erab* in e.ectronic tor- 
Sa iadabfe by a computer pmcessor. For example, 
^ datSfses may each be partially or wholly stored ,n 

rS'Thfoutputdataconceming the oreach postal 

to addition to the data required to 
thnt form the postal address elements of the full ad 
IllTor example, the addttional output data may in- 

omnia the additional output data may, for each postal 

the cateaory (or type) of postal address element. 
mSTZSrZ, the output data is in aformthat en- 

may Z be sent to the intended recipient via conven- 
tional postal delivery services. 
SSl The f irst database of data representing a mA- 

as auee data structure having a root node and termi- 
nal in a multiplicity of leaves, the path from a root 



addresses in the USA, the nodes in the first lev lb low 
SSSV ^ awa V from th root) may rep, 

next leve. may represent counties wrthin ach stat* 
, mm 51 Each postal address element may be com 
JEd of S - the address. Each postal address el- 
ement may comprise subeiements. «p.e^ 
rate words. For example, the words "NEW YORK may 
form a single postal address element. Alternately, one 
» poluddU element may be required to represent 
pach word or sub-element of an address. 
S?»T Ad^antageoushr, there is provided data ena- 
L.in« thP orocessor to determine whether a pair of 
'nodes of ZZ database, at different iocations (for 

ST the same postal address. For example, each 
ntdeln the tree may be assigned an off-set value indi- 
cating the distance in memory to the next node at the 
Sieve., the data being so arranged that tt«o nodes 

value associated with the first node in memory (the sec 
ond nSe being stored further ahead In the linear mem- 

saasssssg 

h Salearstore and before the nextnode in the 

so node B having children nodes D and E and child node 
ctvUild nod* ^and 

SCSI? to' define' whether a pair of 
fodesofthe first database relate to thesame postal ad 

Specttve ievels of the nodes in the tree data struc- 

ra017l Preferably, the data representing each postal 

40 for the address element. Having such codes enables tne 
Zt ^presenting the addresses to take up less storage 
JSTS example, certain words may ■ occur -ny 

Ltal address element is stored. Preferably fterejs 
« pmld a separate data store, in -*J«£££ 
tionarv enabling the codes representing postal address 

etements to be decoded by reference to the separate 
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K of priding a dictionary In the form of the second 
Sdn* readable database, the dictionary comprising 
data epienSg entries, each entry corresponding to 

*°°< ess s,ement represented by !T 

^features relatingtothat dictionary will nowbe dte- 
Sd?S2yelantry,re P resentedby the path 
S each UT of the tree to the root, of the dictionary is 
ST?U the same address element rnpmm** 
^ different parts of thefirst database corresponds to on 
one e r? in the dictionary. However, there may be 
Ire than one dictionary. The dictionary may also com- 
prise mo eTan one tree structure. Separate dictionar- 
isorTeparatetreestmcturesofadictionarymayfac^ 

Se aster searching. A plurality of dictionaries may be 
earched on the input of a plurality of input terms. The 
Nonaries may be searched one at a time, for a plural- 
Z *Z2*Z in paraile.. Alternately, the dictlonar- 
1 nS ^ be sTrched in paral.el for a p.ura.Ky of input 

IS Advantageously, the dictionary te so arranged 
na nodes of the *ee data structure after the root node 

!nt,v in the dictionary sharing the unique stem defined 

beino after the stem, and a plurality of the nodes have 
«TraS of such portions. The data structure may ef- 
?eSS b ^ considered as a modified trie" data struc, 
ure wherein each node effective* represents a single 
1„ hut the data representing said portion is held in 
nodes pref erabty contain data »- 
Igt^numberof separate portions, saldnumberb*- 
Sg greater than or equal to the number of c M I no** 
such a structure may facilitate faster searching of the 
riSonaS because L processor can discoun <*Jd 
nodes asbeing irrelevantto the search in question wrth- 
out havTng to f ollow a pointer to such child nodes. 
PT lome of the portions representing . the data 
rnay include a middle part only of the entry, thereby ex 

data structure has no more than one child node It w.H 
be understood that a node can inciude a terpen 
point (the node effectively acting partly as a leaf and 
part* - a parent node). A node may also "dudjaju- 
Xy of tennination points. The data tree may include 
Z leaf nodes (i.e. childless nodes) and mixed I leaf/ 
parent nodes (i.e. nodes having at least one child and 
a? east one termination point). For exampie, a node o 
"he ie of the dictionary may represent many elements 
staring the same stem, the stem itsert being a postal 
aSess element (for example, a node wherein the path 



tromtheroottothenod r pres nts "LONDON", which 
*TseH an address e.ement. must also hav porfw* 
nodes in lower levels if elements such as LONDON 
DERRY or "LONDON ROAD" are to be represent d . 
5 Tn e dictionarythuspreferab ly includesdatar pres nt- 
ative of end of string characters. 
ro0221 Conveniently the last character of the portion 
of each dictionary entry represented by a termination 
Im wlSher a leaf by itself or part of a parent node 

JST Pleach of a plural* of portions JJ 
Sed within a node of at least some nodes of said 
rX of nodes is a single character. For example J 
,5 a node of the tree data structure representing the dic- 
Lnar! has more than two child nodes, it is preferable 

Sir At least some nodes may be such that all of the 
pomons are a single character. The root node may be 

20 one such node for example. „o„„oHthatat 
r00241 Preferably, the dictionary is so arrangedthat at 

east lome of said portions of the nodes each comprise 
aSS of characters. For example, inthecaseof a 
nSode efther having on* one child node or being a 
2S p rieafnode,«thegivennode M uldothe^s^ 
fesented as a series of single child nodes it is preferable 
or me node to represent the series of nodes collapsed 
n o a single node, so that the given node contains all 
SiSL. as a single portion that wouWi gm* 
so berepresentedbythatseriesofsingjchjdnodes^ 
ra0251 By way of example, consider the dictionary 
Srelt may be used to r*re* mtje . elements 

contains two portions, one containing a B the otner 
♦ !nfn rifln "L" ' the "L" portion of the root node points 
containing an L.me i_ h u » u unKinoNH"- 
tn « nure leaf node containing the portion ONDON1 , 
to a pure ieaT nou« tQ 

a leaf representing "UAREH • 
TO0261 The dictionary structure may be a structure 
M Slem to a trie structure, where the letters in the 
rode^veeffective^beenpromotedbaclctowardsthe 

root to their parent nodes. 

00271 Step e) is preferably performed so that each 

55 "or accesses the data relating to any given node in the 
dSary no more than once. Step e) may however be 

^r re thanonceasisexplairiedbe.ow;in3uchaoase 
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th oroc ssormaya«*ssag-.vennodeinthedictionary 
^ ri ^stepe),butthep = pre, e 
erablyaccessesft data In that node no more than once 
nor ssearch of the dictionary. 
K Advantageously, the dictionary is stored in 
Sry ^ether^M, ROM or ofterW.se) sue* that 
^ processor is able to have faster access to those 
node^Tn the dictionary that are most commonly ac- 
^ssed compared with the access times relating to 
nodes that a^ accessed less frequent*. For example 
h binary may be stored as a 'i^^* 
nodes the root node being stored as the 
2 iW store the child nodes of the root node being 
stored hereafter and the child nodes of those 
S inoTes being stored afterthe last child node of the 

n such a way also allows the pointers that enable the 
Lcessor to find the child nodes of a given node to be 
exprested as offset values, wherein those offsetvaiue 
^general* redely small in 
lectively take up less memory than they might do rf the 
n^s were stored in another manner) owing to the fact 
that ^ children of a given node may be grouped to- 

?<£ Preferably, the dictionary may be stored such 
SdesTn a level closer to the root node are stored 

nodes in levels further away from the root node. Prefe 

abj a least the root node is stored in fast memory^ 

f a memory medium, such as RAM memory, that the 

TcTr is able to access faster than other types of 
processor is able to acx djsk) More 

""tSKJSSS root node and its children nodes 
SSS SS flSU Even more 
root node and the nodes in a plurality of levels below the 

root node are stored in fast memory. 
rLini SteD e) may be so performed as to find if there 

each tnput data term. If more than one input date term 

ionar7correspondlngthe re to,andthereisas.ngleport- 
aUdSha'containspostaladdresaetemente^ 
SSha to all of the dictionary entries found by the 
P ;2 Jn the method preferab* outputs data re- 
E to mat single postal address. It is likely in such a 
Sd s* Posta. address U the address ,n- 
tended to be retrieved by the input data. 
Ml If however, there are one or more .nput data 

SS and/or there are a plurality of postal address- 
ed^ 

S co responding to the input data, or no such posU.1 

Sde further steps to either output data MWtagM 
staLof the processor's findings (forexampleto inform 
a^ser using the method that a single address could no 
be found) and/or to perform further steps in an attempt 
Xrovethelikelihoodofretrievingtheintendedpostal 



address The provider of the input data initially provided 
fhe p ovider.for example, being a user or a machine 
such as another computer system) might as a part of 
this process be prompted for further or different input 

5 1Zz\ Steps f) and g) of the meftod may include as- 

Sningthelocation of each occurrenceof data wrthm 
meLdatabasecorrespondingtotheoreachemrym 

Z dictionary, determined by the processor as j^e- 
w spending to me one or more input terms, and determin- 
ngf^ 

or addresses being in accordance with the ^.nput da£ 
For example the processormay simply determine from 
2 Sns of I occurrences the posta, I address or 
,5 addresses having (or sharing) the greatest number of 

?OwT^vantageous*, there is provided a separate 

refer to the first database, .) to determine the locat.on o, 
20 ne^oreachnodewrthinthefirstdatabasecorresponding 

o an entry in the dictionary and ii) to 
a pair of nodes, of the first database, at different loca 
Lnsn he treedata structure relate to the samepostal 

add e s Each entry in the dictionary may for exampte 
25 be linked to one or more entries in the separate data 
sto e each entry in the separate data store including 0 
fpointertotheTocation of the node in theflrst database 
Presenting the postal address element corresponding 
ot e Senary entry and ii) an off-set value indicating 
so fte distance in memory from that node ; c . he next node 
at the same level. The processor may therefore be able 
SSEEn which of a muKiplicity of potential matches 
nthe first database belong to the same P^Uddress 
simply by accessing the separate store of data and 
ss wEut deeding to access the firs t datebas* Tto. 
steps f) and g) of the method preferably include a step 

of dam the location of each node in the first database 
ooJespond ing to the or each entry in the dictionary, de- 
40 xZZ by ft. processor as corresponding to th. one 
or more Jut terms, and then ^ines rom ft lo- 
cations so ascertained in conjunction with data for ex 
Sp,eoff-setvalues,fromtheseparate store of datathe 
3 address or addresses being in accordance wrth 
<5 ft Spu" data, whereby the processor need not access 
Z fiSl database in respect of those nodes .which do 
Z correspond to the postal address(es) judged to be 
in accordance with the input data. 
mo341 Ratherthanaimingtoretrieveasingleaddress 

50 'S^Wto^^JTZ^Z 
tails of a plurality of addresses, it being left to the user 
t tosettlcholftoseaddresses,«morethanoneare 

r^tripved is the target address. 
oS ^eP e) thus preferabiy includes the processor 
55 Eitiali searching the dictionary for an> - jy ^e ? 
tionary identical to the or each input term. If dunng step 
S no entry is found that is identical to any o the mpu 
IZ the processor may then search the dictionary for 
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entries having a lower quality correspondence wrth the 
or each "input terra For example, on not finding any en- 
tresTnt^ 

es?ornaythensearchallowingforoneerroratf,rst.and 

or two errors, and so on. A single error may be counted 
hTsearch term and the dictionary entry differ by one 
character being deleted, added or replaced wrth a d,f- 
tert character. The quality of correspondence be- 

enshtein distance" (or edit distance) between the two 

tS As mentioned above, the searching of the dic- 
ionary is preferably performed so that for 
of the dictionary the processor need access the data 
represented by each node no more than once. There- 
oTeTtne processor is to allow for one or more errors 

anv given node allowfor more than one of rts child nodes 
al "senting a possible route through the dictionary 
to a matching entry in the dictionary (ailowing or arid 
one or more errors). For example, if the processor is to 
a ow or one error, all of thechi.d nodes of the root node 
« n be of relevance, because the first letter of the input 
relmaybetreatedasbeingsubstitutablebyadifferent 

character as being an erroneous added character or as 
beCep esentatrve of the second character of a given 
en ry in ft. dictionary (i.e. the data input term missing 
the M character of the target dictionary entry). 0^ al- 
owing for one or more errors, the nodes closest to the 
root node wiii be of much greater relevance than nodes 
on levels further away from the root node. Having the 
data relating to nodes closer to the root node in fast 

fte dictionary and allowing for one or more errors In the 

KJTa p ,urality of input terms are inputted and the 
processor f inds during step e) one or more entries in the 
Sctionary identical to at least one. but not a., o ^einput 
terms respectively, thereby leaving one or more un- 

Continues searching the dictionary for entries having 
a lower quality correspondence wrth those unmatched 
r n put teirns. Such a method assumes that if an input 
^matches a dictionary entry exactly then there is a 
good chance that the inputterm ,s actually correc .Put 
another way, it maybe assumed thatthere ,s a rela ively 
S probability that the target postai address includes 
a postal address element identical to the .nput term 
f0U P nd in the dictionary, it being assumed that rt is rela- 
tively unlikely for an incorrect data term (i.e. one con- 
taining an error) to correspond exactly to a postal ad- 
Sselementoiadifferentandthereforeincorrect post- 
al address. For example, if the input data includes the 
terms "LONDRON" and "HEATHROW", the term "HEA- 
THROW" would be matched, but the term LONDRON 
would not; the processor wou.d then ( P"~*"^* 
for dictionary entries corresponding to LONDRON al 



lowingforone error, butthe processor would nots arch 
for the term "HEATHROW" again. 
[0038] Similarly, the method may be such that if, on 
educ ng the quality of correspond nc required or 

ther terms are matched, but otherterms are still left un- 
matched, searches of entries "aving co^espondence of 
even less quality need only be conducted on those re- 
maining unmatched terms. Such a method of searchmg 
,o may save considerable time that would otherwise be 
spent on searching for lower quality matches for terms 
thathavealreadybeenfoundtocorrespondtodKtionary 

entries with a relatively high quality correspondency 
[0039] The searching of the dictionary may altema- 

« ivelysearchforentries^^ 

terms thequality of correspondence being within agiv- 

enTh'esnoW, which may be pre-set and may be fixed^ 

entries in the dictionary wrth the inputterm^) r fte entry 
20 andtermconcernedarewithinapre-setedrtd.stanceof 
2 other. Many completely different inpul ^ terms may 
be searched in parallel so that the passes through the 
dictionary may be minimised. 
mm The postal address elements formmg a postal 

25 address may notionally be divided bite ' cf Sories^The 
categories may simply be the level in the tree of the first 
database in which the postal address element appears 
The categories may be representative of the type of 
costal address element. There is preferably provided 
30 Sataenablingtheprocessortoascertainthecategoryof 
a given postal address element represented by data .n 
11 first database. Such data may implicitly be provided 
i t e structure of the f irst database. The method may 
thus be able to distinguish between postal address de- 
ar ments being formed of the same characters, but being 
rLrenLtegory.Forexamp.e.fftherewereentr.s 

in the first database relating to both a town and a county 
named "ABCDEF", It would be beneficial if the proces- 
sor were able to distinguish between the two_ 
40 [0041] In the case where a given postal address ele- 
ment may be attributed with or assigned . > category the 
processor is preferably able to be prov,ded with input 
data including an indication of the category of postal ad- 
dress element that each of at least one of the input terms 

45 Stents. The input data received by the p = onn 
step d) may be processed by the processor before step 
X pertormedJVrtematively the input data receded by 
the processor in step d) may be preprocessed 
TO042] The processing of the input data may for ex- 
50 arS be for the purpose of reducing the likelihood of 

data syntax between the input terms and the postal ad- 

ofthe firstand second databases. For example, he do- 
55 tionary may be formed of certain characters only, ille- 

„ar chafers of postal address elements erther not 
oeingrepresentedinthedictionaryorbe.ngrepresented 

by Serent characters or in a different order. For exam- 
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Die the entry in the dictionary correspond.ng to a postal 
add 2s element including a space, may exclude the 
soace character. Also the entries in the dictionary may, 
Texamp L be represented without using any upp r 

case letters « W« case ,et,erc in the inpUt " 
are co^eL to Tower case .etters before the dic«onary 

r S eamhedthen the search in step e) may then be case 

^flntries represented in the dictionary entries 
mav be formed such that information concermng the 
prSntee number of an address is exduded or exam- 
Die "10 High Street" would be represented .n the dc 
Senary as'High Street".Thus ;.when an ^"emjstejt- 
ing wfth a number is provided as part of the input data 
such data may be processed, before step e * par- 
ked so to remove the number from the beginning of 

spending entries. Once an entry has beer .found that 
matches the input term (with the number deleted the 
processor may then ascertain whether the postal ad- 
dress element relating to the input term is represented 
by a node having nodes representing premise -number 
as its child nodes and whether or not any of those child 
nodes represents the number removed from the input 

In the case where a postal address element in- 
ches a number, whteh does not relate to a premise 
number the corresponding entry in the d,ct.onary is pref- 

epresented by an entry having the relevant 
n»Z moved to the end of the entry, the denary 
preferab* not containing any data relating to premie 
numbers The processing of the input data including a 
numbe for example a number appearing at the begin- 

!ng the inputten^ into two terms, one term .ncudng the 
number at the end and the other term excluding the 
number Thus the processor is able to match postal ad- 
d e2 elements with input terms containing numbers 
whether or not the numbers are representative of 
premise numbers. Since the two split terms | share the 
same stem, the processor is able to search the d.ction- 
ary for he iwc te'rms In parallel without needing, «h*J 
considering the characters in that stem, to access any 
more nodes than when searching for only one of the two 
Zut terms. Treating numbers in this way therefore 

creasing processing time and may even reduce he av- 

[0045] The processing of the input data may also in- 
clude considering whether any given inputterm includes 

a s^rcharacters more susceptible to errors (human 
error) than other sets of characters. The processor may 
L programmed to recognise such stnngs, each stnng 
big associated with one or more different stnngs wh 
which it is commonly replaced in error when inputting 

given string. For example, the stnng "STREET or the 



strinq "LANE" mightbe inputted as part of th input data 
Sg to a given address wher the corr « string . 
actually "ROAD". The processor is preferably pro- 
arammed to search the dictionary in a mann r that ac- 
5 S for such a string as being r p.ac able with an- 
other conceptually similar string. For example If the in- 
put term is "RED LION ROAD" the P™^" '«f»? !° 
recognise that the string of characters "ROAD might 
have been entered in error forthe stnng STR ^ 
,a t0046] Advantageously, the process.ngo he npu 
data includes ascertaining whether any of the input, 
terms correspond to a category of postal address ele- 
ment and if so including an indication of the category in 
the input data. For example, the processor may be pro- 
,5 grammedto recognise whether an input data term . in 
a format corresponding to a postal code (or post-code, 
S £2 or the like), and if so continue *e method on 
the basis that the data term is such a postal code The 
dictionary, or other aspects of the data used when per- 
20 £m*g ft. method, may be arranged and ordered by 
category and thus the retrieval of an address may be 
2 more efficient. There may, for example be sepa- 
rate dictionaries for entries relating to postal elements 

of a given category. . ro/% 

25 °0047] Thecategorymaybebasedonotherchaac- 

eristics, such as for example, the number of characters 
required to represent the postal address element 
0048] Thedata,inparticularthefirstandsecondda- 
abases used when performing the method is prefera- 
so ^electronic form. For example, the data may be 
stored on RAM, ROM, CD ROM, tape, magnetic disc or 
any other suitable electronic machine readable data- 

m 0 S 49] The input data may be entered manually by a 
35 user for example via a keyboard or other manual data 
eSwaratueTheoutputdatamaylnttial^beprovid- 

rdasavisualindteationonaVDU.The output data may 
alternatively, or additionally (for example atter a suita- 
b.e confirmation is made by the user) be eerfronicaHy 
40 pasted (i.e. inserted) into a separate date storage area 
on a computer system. For example, the output data 
may be pasted into an application running on a compu- 

rawof The input data may be taken from a separate 
45 SL store. For example, the separate , daU .store may 
consist of data stored in memory (whether RAM.^ ROM 
CD ROM, on a hard drive or otherwise). The data s ore 
may include data relating to an existing database includ- 

ing y postal address information. The output data may 
so Znbeusedtoremovaortohighr.ghterrorsinthepo^ai 

address information in the existing database The data 
store may simp* relate to data used by jor ln relation 
with a seDarate application running on a computer. 
$51, According P to a second aspect of the present 
55 nvention there is also provided a method of retnev.ng 

resenting a multiplicity of postal addresses, the method 
comprising the following steps: 



7 



13 



EP 1 197 885 A2 



14 



I) providing a processor, 

in providing a database, accessible by the proces- 
sor of data repr senting a multiplicity of postal ad- 
dresses, a dictionary of terms corresponding to 
those found within the postal addresses, and Infor- 
mation, for example location information, enabling 
the processor to ascertain the one or more postal 
addresses in the database having a term corre- 
sponding to each dictionaiy entry, 
iii) providing the processorwith input data forfinding 
a postal address in the database, the processor 
then searching the dictionary for entries corre- 
sponding to the input data and ascertaining from the 

information if any postal address in the database 
corresponds sufficiently closely to the input data, 

fjToutputting data relating to the results of step Hi). 



[00521 The database operated on during the perform- 
ance of the method according to the second aspect of 
the invention may effectively comprise 

a first data structure representing a multiplicity ot 
postal addresses, each postal address being formed of 
one or more postal address elements, the first data 
structure comprising respective codes representing re- 
spective postal address elements, 

a second data structure, in the form of a dictionary, 
comprising a multiplicity of entries, each entry corre- 
sponding to at least one postal address element repre- 
sented by the data in the first data structure, 

a third data structure linking each code in the first 
data structure to data from which the postal element rep- 
resented by the code can be directly ascertained, and 
a fourth data structure comprising data linking a 
qiven entry in the second data structure with each item 
of data in the first data structure representing the postal 
address element corresponding to the entry in the sec- 
ond data structure. 

[00531 Alternatively, or additionally, the dictionary 
may be in the form of a tree data structure, having a root 
node and terminating in a multiplicity of leaves, the path 
from the root node to a leaf being representative of term 
within a postal address in the database. 
[0054] It will be readily appreciated by those skilled in 
the art that features of the first aspect of the present in- 
vention may be incorporated into the second aspect of 
the present invention and vice versa. In particular, one 
or more of the following features described above with 
reference to the first aspect may advantageously be in- 
corporated into the second aspect: a) the dictionary be- 
ing in the form of a trie data structure; b) those features 
relating to structuring the trie data structure of the dic- 
tionary to facilitate efficient searching and efficient use 
of data storage space (for example, by having nodes of 
the trie containing a plurality of portions of data relating 
to different entries represented in the dictionary and/or 
having the root node and at least its child nodes stored 
in fast memory); c) searching the dictionary for entnes 



which are at a given dit distance from an input term; d) 
providing multiple dictionaries; e) performing searching 
in a dictionary for a plurality of input terms in parallel; f) 
the features relatingtothe preprocessing of input terms, 
s forexampletoprovidesupportforabbreviatedstreetde- 
scriptors, support for numbers in addresses and other 
factors peculiar to postal address data; g) providing a 
hierarchical postal address database structure, for ex- 
ample a tree data structure; h) features relating to the 
w address database including codes representative of ad- 
dress elements and the provision of a separate data 
store for decoding such codes; i) features relating to as- 
certaining the target postal address being searched or 
including ascertaining which of a multiplicity of terms (or 
»s postal address elements) belong to the same postal ad- 
dress by considering.for example, information concern- 
ing the relative location (from off-set information) of each 
term (or postal address element) in the postal address 
database, such information preferably being provided 
20 as a separate data structure (which may for example, 
be the fourth data structure mentioned above) enabling 
the processor to ascertain whether two terms are 
present in the same postal address without needing to 
access the postal address database (or first data struc- 
25 ture). 

[0055] For example, with reference to the second as- 
pect of the invention, during step iii), the processor ad- 
vantageously initially searches the dictionary for entnes 
corresponding exactly to the input data and then, if one 
30 or more terms included in the input data are matched 
but other terms are not, the processor preferably con- 
tinues the search in the dictionary for entries having a 
lower quality correspondence with those unmatched 
terms, whilst not searching for further entries in the dic- 
35 tionary for those terms where entries exactly matching 
those terms have already been found. 
[0056] As mentioned above the features relating to 
the input data provided to the processor being proc- 
essed orpreprocessed before the dictionary is searched 
40 described with reference to the first aspect of the 
present invention may be incorporated into the second 
aspect of the present invention. Thus, in step m) the in- 
put data may be processed (or preprocessed) by the 
processor, for example, to reduce the likelihood of a 
45 postal address not being found through differences in 
syntax between the input data used to searched the dic- 
tionary and the data representing the multiplicity of post- 
al address of the database provided in step n). 
[0057] According to the first aspect of the invention 
so there is also provided an apparatus for retrieving data 
representing a postal address from a database repre- 
senting a multiplicity of postal addresses, 



the apparatus including a computer processor, 
55 the apparatus being provided with 

a first database, accessible by the processor 
comprising data representing a multiplicity of 
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postal addresses, each postal address being 
formed of on or more postal address ele- 
ments, and 

a dictionary in the form of a second database, 
accessible by the processor, comprising data 
representing entries, each entry corresponding 
to at least one postal address element repre- 
sented by the data of the f irst database, where- 

the dictionary is in the form of a tree data struc- 
ture having a root node and terminating in a 
multiplicity of leaves, the path from the root 
node to a leaf being representative of an ele- 
ment of a postal address, and 
the processor is programmed to be able 
to receive input data comprising an input term 
for finding a postal address represented in the 
first database, 

to search the dictionary for entries in the dic- 
tionary corresponding to an input term, 
to ascertain information concerning data in the 
first database representing the or each element 
corresponding to the or each entry in the dic- 
tionary determined by the processor as corre- 
sponding to an input term, and 
to output data representing the or each postal 
address, if any, represented by the first data- 
base determined by the processor as being in 
accordance with the input data. 

[00581 The apparatus may, of course, be arranged to 
be able to perform a method accordingtothefirst aspect 
of the present invention. 

ro059] According to the second aspect of the inven- 
tion there is also provided apparatus for retrieving data 
representing a postal address from a database repre- 
senting a multiplicity of postal addresses, the apparatus 

'"^"computer processor and one or more databas- 
es accessible by the processor, of data representing a 
multiplicity of postal addresses, a dictionary of terms 
found within the postal addresses, and information en- 
abling the processorto link a given entry in t^dkMnary 
with ihe one or more postal addresses in the database 
having a term corresponding to the dictionary entry, 

the processor being programmed to be able to re- 
ceive input data for f inding a postal address in the da- 
tabase to search the dictionary for entries correspond- 
ing to input data, to ascertain if any postal address in 
the database corresponds suff iciently closely to the in- 
put data, and to send output data relating to one ormore 
postal addresses in the database. 
[0060] The apparatus may, of course, be arranged to 
be able to perform a method according to the second 
aspect of the present invention. 
[00611 The apparatus according to any aspect of the 
invention may. for example, be a conventional computer 
system loaded with the appropriate software and pro- 
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vided with the appropriate data. 
[0062] The pres nt invention yet further provides a 
computer program product executable in a processorto 
performamethodaccordingtoany aspect of the present 
invention as describ d abov . wh n provid d with th 
appropriate data for the programmed processorto op- 
erate on. The computer program product may take the 
form of a computer program stored on an electronic data 
earner, such as acomputer, ROM, RAM, CD ROM, mag- 
netic disc or tape or any other form of electronic record- 
ing media. 

[0063] The present invention also provides such a 
computerprogramproducttogetherwithadata product, 
the data product enabling a processor once pro- 
grammed with the computerprogram productto perform 
the method according to any aspect of the present in- 
vention as described above. The data product may be 
in the form of data stored on an electronic data earner, 
such as a computer, ROM, RAM, CD ROM, magnetic 
disc or tape or any other form of electronic recording 

[0064] It will be appreciated that the postal addresses 
represented by the data referred to aboveneednot each 
represent a unique postal address in reality. For exam- 
ple the postal address represented by the datamay re- 
quire the addition of a name of a person (an individual, 
or corporate body for example) and/or the number or 
nameof the relevant premises. Such data may of course 
be manually added to the output data before the output 
data is used to mail any items to the intended postal ad- 
dress. 

[0065] According to yet another aspect of the present 
invention there is provided a data product, accessible 
bv a computer processor, the data product including 

data representing a multiplicity of postal address- 
es, each postal address being formed of one or more 
postal address elements, 

a dictionary comprising data representing entries, 
each entry corresponding to at least one postal address 
element represented by data in the data product, where- 
in the dictionary is in the form of a tree data structure, 
having a root node and terminating in a multiplicity of 
leaves the path from the root node to a leaf being rep- 
resentative of an element of a postal address, and 

data linking a given entry in the dictionary with the 
one ormore postal addresses in the data product having 
a term corresponding to the dictionary entry. 
[0066] Such a data product advantageously enables 
a suitably programmed computer processor to search 
the dictionary for entries in the dictionary corresponding 
to an input term, and to find data in the first database 
representing the or each address element correspond- 
ing to the or each entry in the dictionary determined by 
the computer processor as corresponding to an input 
term, whereby the data product may be used to find a 
postal address represented by the data product in re- 
sponse to input data comprising one or more input 
terms. 
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,00671 Th pr sent invention also provides a data 
product, accessible by a computer processor, the data 
product including • . 

a first data structure representing a multiplicity of 
postal addresses, each postal address being formed I of 
one or more postal address elements, the first data 
structure comprising respective codes represent.ng re- 
spective poistal address elements, 

asecond data structure, in the form of a dictionary, 
comprising a multiplicity of entries, each entry corre- 
sponding to at least one postal address element repre- 
sented by the data in the first data structure, 

a third data structure linking each code in the first 
datastructuretodatafromwhichthepostalelementrep- 
resented by the code can be directly ascertained and 

a fourth data structure comprising data linking a 
given entry in the second data structure with each item 
of data in the first data structure representing the postal 
address element corresponding to the entry in the sec- 
ond data structure. 

r0068] Such a data product advantageously enables 
a suitably programmed computer processor to search 
the second data structure for entries ^P. 0 ^ * 
an input term, on finding an entry to find data in the first 
database representing the or each address element cor- 
responding to the or each entry in the dictionary deter- 
mined by the computer processor as corresponding to 
an input term, whereby the data product may be used 
to find a postal address represented by the data produc 
in response to input data comprising one or more input 
t©rms 

r0069] As has been mentioned above, providing a 
separate data store (the third data structure), in add*on 

to a dictionary (the second data structure), enabling the 
codes (e.g. in the first data structure) representing post- 
al address elements to be decoded by reference to the 
separate data store enables that separate data store to 
be designed to allow the processor access to full and 
correctly formatted representations of the address ele- 
ments, whilst the dictionary may be formed without dis- 
tinction, for example, to different formatting thereby fa- 
cilitating more efficient and/orcomprehensivesearch.ng 

for address elements corresponding to an input term. 
ro0701 The data products described above may of 
course be used in the method of the present invention 
as described above in relation to the first and/or the sec- 
ond aspects of the present invention. As such, the data 
products may be configured and arranged to be suitable 
for use in the above-described methods of the present 
invention, the data products thereby possibly incomo- 
rating any of the features described above in connection 
with those methods. For example, the coded postal ad- 
dress database structure (the first data structure) may 
be in the form of a tree data structure. Also, the diction- 
ary (the second data structure) is preferably formed as 
a modified trie structure as described above. Further- 
more, the fourth data structure preferably comprises, in 
respect of each entry of the second data structure, a) 



data for xample one or more pointers, relating to th 
locations) of th node(s) in the first data structure cor- 
responding to the entry, and b) off-set value data indi- 
cating the distance in memory from the or each of said 
s node(s) in the first data structure to the next node at th 
same level, thereby enabling a processor to ascertain 
from the second andfourth data structures the locations 
in the first data structure of the nodes of the postal ad- 
dress or addresses being in accordance with given input 
w data, without needing to access the first data structure^ 
f0071] Throughout the above general descnption of 
the invention, and below in the claims, various databas- 
es and data structures have been described in a way 
which might suggest that data is formed either as a ^uni- 
15 tary data collection or as a group of separate but nter- 
connected datacollections. As will be appreciated, there 
are many ways in which the present invention may be 
implemented provided that the effective underlying 
structure of the computer program product, computer 
20 software, and/or data is in accordance with the pnnci- 
ples as set forth above. 

r00721 By way of example, an embodiment of the in- 
vention will now be described with reference to the ac- 
companying drawings of which: 

5 Figure 1 is a schematic block diagram giving an 
overview of operation of a system according to the 
invention, 

Figure 2 is a schematic diagram illustrating how 
so postal address data is arranged within the system, 
Figure 3 is a schematic diagram illustrating how the 
dictionary of the system is arranged, and 
Figures 4a and 4b are schematic diagrams illustrat- 
ing in greater detail the operation of the system. 

35 r00731 Figure 1 shows a system 1 comprising a proc- 
essor 2 and a data base 3. The database 3 comprises 
adictionary 4, alocation indexS. acoded postal address 
data store 6 and a postal address element decoding in- 
40 dex 7. The processor 2 is able to access the data stored 
in the database 3, to receive input data 8, generally in 
the form of search terms relating to at least part of an 
address to be searched and to send output data 9, gen- 
erally in the form of a full and correct postal address. 
« [0074] The coded postal address data store 6 in- 
cludes representations, in the form of codes, of a multi- 
plicity of postal addresses, each postal address being 
formed of at least one postal address element. For ex- 
ample, a postal address may comprise a premise name 
so element, a house number element, a street name ele- 
ment, a town element, a county element and a posta 
code element (such a postal address thus consisting of 
six postal address elements). The actual address ele- 
ments being represented in the coded postal address 
55 data store 6 as codes are able to decoded by the proc- 
essor 2 with reference to the postal address element de- 
coding index 7. 

[0075] The dictionary 4 comprises entnes relating to 
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each different postal address lement occurring in the 
index 7. Each entry in the dictionary 4 may therefore co- 
respond to many different entries within the coded post- 
al address data store 6. The location of each entry in the 
coded postal address data store 6 corresponding to a 
dictionary entry can be ascertained by the processor 2 
bv reference to the location index 5. 
[00761 The operation of the system 1 may be summa- 
rised with referenceto Figurel as follows. A user enters 
input terms 8, as strings of characters, which are re- 
ceived by the processor 2. The processor Pre^process- 
es the input terms 8 (as will be explained in further deta.l 
later) and then searches the dictionary 4 for entnes cor- 
responding to the inputtetms 8. On finding entnes in the 
dictionary4corres P ondingtotheinputterms8the proc- 
essor 2 then ascertains, by means of the location index 

5 the locations in the coded postal address data store 

6 corresponding to the dictionary entries matching the 
input terms 8. If the processor 2 ascertains that there is 
a single postal address represented in the coded postal 
address data store 6 with postal address elements 
matching all of the input terms 8 then the Processor 2 
decodes the data in data store 6 corresponding to the 
postal address by reference to the postal address ele- 
ment decoding index?. The results are then returned* 
the user as output data 9. The output date 9 can, for 
example, be displayed on a VDU (not shown) and may 
be pasted into whichever application on the computer 
system the userwishes to have the address output data 

entered. „ 

[00771 For example, if the user enters the Input terms 
"RED LION STREET" and "LONDON", entries in the da- 
tabase corresponding to the addresses "Red Lion 
Street Southampton" and "High Holborn, London w.ll 
each contain only one match for the input terms 8 en- 
tered butthe address "Red Lion Street, London" would 
have two matches and would be chosen by the proces- 
sor 2 as the appropriate address to be returned to the 
user as the output data 9. 

[0078] Rgure2showsschematicallythearrangement 
of the dictionary 4. The dictionary 4 is arranged as atree 
structure, having a root node 10 and terminating -n a 
multiplicity of leaves 11 . The path from the root node 10 
to a leaf 11 being representative of a postal address el- 
ement. The dictionary structure may be described as a 
modified trie structure. In a conventional trie structure 
each node of the tree represents a single character of 
a word the path from the root to a leaf spelling out the 
word represented by the leaf. The present data structure 
however, has nodes comprising the letter or letters rep- 
resented by its child nodes. The structure may be 
thought as a trie structure where the characters of each 
node have been promotedtothe node above (the parent 
node) each node thus possibly representing many 
characters (or a single string of characters - as dis- 
cussed below) but each character being associated with 
only one branch to a lower level. Thus the root node 10 
of the present data structure includes the initial charac- 



ters of all of the entries in the dictionary, the nod s on 
the next level down ach contain the second I tt rs of 
entries in the dictionary with a given first letter. For x- 
ample in Figure 2, node 12a pointed to by pointer 15a 
s associated with the letter »B" of node 10 contains detaite 
of the second letters of all entries in the dictionary start- 
ing with the letter'B". In otherwords the tree is arranged 
such that nodes effectively represent a single letter but 
the information concerning what that letter is, is held .n 
10 the parent node together with information concern.ng 
other sibling nodes. 

[00791 One important exception to the nodes each 
representing one or more single letters It j showr .in F.g- 
ure2 Node 11d includes the letters "ANYf so that the 
,5 path from the root node reads "BOTANYH". The charac- 
ters at the end of the entry are combined .nto a single 
leaf, rather than having a string of single child nodes 
terminating in a single leaf. The data space i required to 
hold the dictionary 4 may thus be reduced. Node 11 d is, 
20 as can be seen from Figure 2a, a leaf node but it is pos- 
sible for the dictionary, to comprise nodes that are not 
leaf nodes where the node represents a plurality of char- 
acters representing dictionary entries sharing the same 
stem followed by those characters, and possibly other 
25 characters thereafter. The dictionary Is arranged such 
that only single child nodes contain such a string of char- 
sets rs 

[0080] Figure 3 shows schematically how data is ar- 
ranged in the coded postal address data store 6 The 
30 data 6 is stored as a tree, each node representing a 
postal address element, and the path from the root node 
16 to a leaf node 21 representing a postal address The 
tree structure shown in Figure 3 is arranged such that 
the regions represented by nodes within the tree be- 
35 come smaller the closer the node is to a leaf node 21 
Nodes 1 7 in the level below the root node 1 6 represent 
a county, nodes 1 8 on the level below that representing 
towns, the nodes 19 below that represents stree 
names, the nodes 20 below that representing postal 
40 codes and the leaf nodes 21 representing house nurn- 
bersornames. Ratherthan representing each character 
of the postal address element represented by a node 
the nodes contain codes representative of a postal ad- 
dress element. For example, if node 18a represents a 
45 town named "LONDONWAY" and the node 19b repre- 
ss a street also named -LONDONWAY", the contents 
of both nodes 1 8a and 19b would include a code repre- 
sentative of the word "LONDONWAY". The processor 2 
is able to decode the codes in the coded postal address 
so store 6 by reference to the postal address element de- 
coding index 7. 

[00811 The nodes in the coded postal address store 
6 are actually stored in memory (whether RAM, ROM or 
otherwise^ a linear data store. The date stone 6 .s ar- 
55 ranged linearly in memory (whether ^ ( R0M " ot J; 
erwL), each node being immediately followed by its 
children so that children may be separated by their chil- 
dren, if any. but not by nodes in a lev Iclosertoth root 
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node. For xampl , if node A has children nod s Band 
C child node B having children nodes D and E and child 
node C having child nodes F and G the order of storage 
of those nodes would be A, B, D, E, C, F, G. 
100821 As mentioned above, the location index 5 in- 
cludes data concerning the locations, in the linear data 
store of the postal address store 6, of the nodes corre- 
sponding to each dictionary entry. For each node loca- 
tion the location index 5 also Includes information re- 
garding the location in the linear data store of the next 
Sode after the node's children and the.r descendants 
Said information in the location index 5 (i.e. .n respect 
of the location of a given node in the address data store 
6 the information regarding the location of the node im- 
mediately after the last of its direct descendants, if any) 
is in the form of an offset value (i.e. a value represent, 
ative of the distance in memory between the two nodes 
in the linear data store). Thus the processor 2 is able to 
determine with reference to the location index 5 as to 
whether or not two nodes In the data store 6 relate to 
the same postal address, by calculating whether - the 
node stored further along in the data store 6 is wrth.n a 
distance less than the offset distance associated with 
f irst (in location) of the two nodes. Such an asse^smen 
is facilitated by the way in which the nodes o the postel 
address data store 6 are stored (see above). Thus the 
processor 2 is able to determine which of those £Kur- 
rences within the postal address data store 6 (derived 
from the matched dictionary entries) belong to the same 
postal address with referenceto the location index 5 and 
without needing to access the data store 6 itself. 
[00831 When input terms 8 are entered that relate to 
many occurrences within the coded postal address data 
store 6 the processor 2 ascertains whether there is a 
sinqle address containing an occurrence corresponding 
toeach term Bantered (or which, if any, of the addresses 
have the most occurrences compared with the other ad- 
dress 8S) 

r0084] The searching of the dictionary 4 via the proc- 
essor 2 will now be described, in more detail, w.th ref- 
erence to Figure 2. Firstly, the data input terms 8a and 
8b are preprocessed by the processor to convert all up- 
per case letters into lower case letters, to remove non 
alpha-numeric characters including all punctuat.cn 
marks (including, for example, space characters .apos- 
trophes, quote marks, full stops and the like) and to ex- 
pand abbreviations (for example, expanding ST at the 
end of an input term to "STREET', expanding RD to 
■ROAD", expanding "N" to "North", "W" to "WEST and 

[00851 If the inputterm includes an ambiguous abbre- 
viation the processor splits the term into two terms one 
in the abbreviated form and one in the expanded form. 
Splitting a term into two can avoid not matching a input 
data string with a postal address element, where the 
postal address element concerned contains a letter that 
is not in fact an abbreviation (for example, there may be 
premises known as 'The Big W»). If the term is not am- 



biguous, the processor may not split the .nputterrr. .nto 
two (for example it may be assumed that "RD." .s an 
unambiguous abbreviation for "ROAD"). Also, if the in- 
put data term starts with a number, that number is re- 
5 moved from the beginning of the data term and s nt to 
the end. Moving numbers to the end of data stnngs fa- 
cilitates better searching of the dictionary, where num- 
bers are also represented at the end of the entries. The 
dictionary 4 is also formed in such a way that all premise 
to numbers are excluded from the dictionary to fac.litate 
more efficient searching. Again, if the input data term 
includes a numberthe processor may search for entries 
in the dictionary corresponding either to the data term 
with the number moved to the end and also the data 
is term with the number removed. 

[0086] The input data terms may also be accompa- 
nied with data specifying the postal address element 
tvoe to be searched in relation to that given data term. 

For example, the user may specify that one of the data 
20 terms entered is a postal code; the processor during the 
subsequent searching then being able to ignore match- 
ing data of a different type. 

[0087] |ftheprocessor2wereinstructedtosearchfor 
an entry in the dictionary identicalthe inputterm TOWTl 
25 the processor would start at the root node 10, find the 
first letter T of "TOW" pointing (pointer 15c) to node 
12c where the letter "O" would be found, which in turn 
Is associated with a pointer 15b pointing to node 13c 
where the letter"W' would befound, which is associated 
30 with apointer 15dpointingto node 11c, a leaf node. The 
leaf node is, inthiscase.an end of string character ( ^ ), 
because in the dictionary illustrated there are no other 
entries sharing the stem TOW". The leaf node 11c 
points (pointer 1 5e) to a position in the location index 5 
35 where data concerning the occurrences in the coded 
postal address data store 6 corresponding to TOW 
(theentryfoundinthedictionary) is provided. There may 
of course be more than one such occurrence in the data 

store 6; . 
40 [0088] If an input data term 8 does not correspond ex- 
actly with an entry in the dictionary 4. the processor 2 
will search the dictionary again allowing for one error. A 
single error, for the present purpose, is counted as a 
substitution of a character, a deletion of a character or 
45 an addition of a character. Allowing for one such error 
given the input data term 8 "TOW", would, in respect of 
Sie dictionary illustrated in Figure 2, YW^T£ 
"BOW" "STOW" and TO", in addition to TOW . In the 
case of "BOW the letter T has been substituted with 
so the letter'B", in the case of "STOW" the letter "S" is add- 
ed and in the case of TO" the letter "W" has been de- 
leted (Itwillbenotedthatnode13ccontainsotherchar- 
actere, as well as an end of string character because 
thereare,inadditiontothewo ra TO",otherwo^sNr- 

55 ing the stem "TO" such as, for example, TOW . The 
node 13c therefore effectively acts, in part, as a lea 
node The processor 2 if unsuccessful in finding a postal 
address corresponding to the input terms 8 may allow 
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former errors in one or more of the input t rms.ifone 
input term 8 is matched without error, it is assum d that 
such an input term 8 is correct, unless, that is, it De- 
cern s apparent to eith r the proc ssor 2 or the user 
that the input term is not correct. For most situations 
assuming that an exactly matched input term is correct 
saves time on searching for close, but not exact match- 
es that would otherwise turn out to be irrelevant. 
[0089] Figures 4a and 4b illustrate, with schematic di- 
agrams, a search for an address 30 within the coded 
postal address data store 6. As shown in Figure 4a two 
input terms 8a, 8b are inputted by the processor (not 
shown in Figure 4a) and are then searched in the dic- 
tionary 4. The results from the search of the dictionary 
4 include a match 31 identical to term 8b but no matches 
identical to term 8a. The processor 2 then searches the 
dictionary 4 again for entries corresponding to term 8a 
but allowing for one error (any of a deletion, addition or 
substitution). That search reveals three matches 32a, 
32b and 32c. The processor then ascertains via the lo- 
cation index 5 the locations of the entries in the coded 
postal address data store 6 corresponding to the match- 
es found As shown in Figure 4a, two entries 27a and 
27b in the data store 6 are found relating to dictionary 
entry 31 and there are three entries 28a, 28b, 28c in the 
data store 6, each one corresponding to one of the three 
dictionary entries 32a, 32b and 32c. From data in the 
location index 5 the processor is able to ascertain that 
the entries 27a and 28c in data store 6 corresponding 
to dictionary entries 31 and 32c are located within a 
group of data representing address 30. The processor 
then decides that this is the address corresponding to 
the input teims 8a and 8b and then accesses the data 
store 6 With reference to Figure 4b the processor then 
decodes the codes in the nodes 29, 28c, 27a relating to 
the address 30 held in the data store 6 with reference 
to the postal address element decoding index 7. The de- 
coding index 7 includes entries 7a, 7b, 7c, 7d enabl.ng 
the processor to ascertain the full postal address ele- 
ment represented by a given code. The processor is 
thus able to output the full and correctly formatted ad- 
dress 9 comprising address elements 9a, 9b, 9c corre- 
sponding to the nodes in the coded address store 6. 
[0090] As mentioned above, it will be appreciated, 
that there are many ways in which the present inventoon 
may be implemented provided that the underlying struc- 
ture of the computer program product, computer soft- 
ware and/or data is in accordance with the principles 
as set forth above. It will also be understood that the 
invention is not limited to the embodiment described 
above with reference to the drawings, but is capable of 
numerous rearrangements, substitutions and modifica- 
tions without departing from the spirit of the invention. 
Such alternatives will be readily apparent to those 
skilled in the art and are encompassed within the spirit 
of the invention and the scope of the claims appended 

[0091] For example, the dictionary entnes and the 



preprocessing performed on input t rmsmaydiff rfrom 
country to country. For example in som countries, it is 
common for numbers to form a part of the address in 
addition to premise numbers and may need to be treated 
5 differently. In other countries, non-alphanumeric char- 
acters may also have greater significance than coun- 
tries such as the UK where those characters may effec- 
tively be ignored when searching the dictionary. 

10 

Claims 

1. A method of retrieving data representing a postal 
address from a database representing a multiplicity 
is of postal addresses, the method comprising the fol- 
lowing steps: 

a) providing a first machine-readable database 
comprising data representing a multiplicity of 

20 postal addresses, each postal address being 

formed of one or more postal address ele- 
ments, 

b) providing a dictionary in the form of a second 
machine-readable database, the dictionary 

25 comprising data representing entries, each en- 

try corresponding to at least one postal address 
element represented by the data of the first da- 
tabase, 

c) providing a processor able to access the data 
30 stored in the first and second machine-reada- 
ble databases, 

d) the processor receiving input data compris- 
ing one or more input terms for finding a postal 
address represented in the first database, 

35 e) the processor searching the dictionary for 

entries in the dictionary corresponding to the 
one or more input terms, 

f) the processor ascertaining information con- 
cerning data in the first database representing 

40 the or each postal address element corre- 

sponding to the or each entry in the dictionary 
determined by the processor as corresponding 
to the one or more input terms, and 

g) the processor outputting data representing 
45 the or each postal address, if any, represented 

by the first database determined by the proces- 
sor in view of the information ascertained in 
step f) as being in accordance with the input 
data, wherein 



50 



55 



the dictionary is in the form of a tree data 
structure, having a root node and terminating in a 
multiplicity of leaves, the path from the root node to 
a leaf being representative of an element of a postal 
address. 

2. A method according to claim 1 , wherein the first da- 
tabase of data representing a multiplicity of postal 
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addresses is formed as a tree data structur having 
a root node and terminating in a multiplicity of 
leaves, the path from a root node to a leaf being 
repr sentative of a postal address. 

A method according to claim 2, wherein there is pro- 
vided data enabling the processor to determine 
whether a pair of nodes, of the first database, at dif- 
ferent locations in the tree data structure relate to 
the same postal address. 

A method accordingto any preceding claim, where- 
in the data representing each postal address ele- 
ment in the first database comprises a code for the 
address element. 

A method according to any preceding claim, where- 
in the dictionary is so arranged that nodes of the 
tree data structure after the root node, each contain 
a portion of the or each entry in the dictionary shar- 
ing the stem defined by the path from the root to 
that node, the or each portion being after the stem, 
and a plurality of the nodes have a plurality of such 
portions. 

6 A method according to claim 5, wherein each por- 
tion either acts as a termination point or has a single 
path leading from it to another node. 

7 A method according to claim 5 or claim 6, wherein 
each said portion of at least some nodes of said plu- 
rality of the nodes is a single character. 

8 A method according to any of claims 5 to 7, wherein 
the dictionary is so arranged that at least some of 
said portions of the nodes each comprise a plurality 
of characters. 



9 A method according to any preceding claim, where- 
in steps f) and g) include ascertaining the location 
of each occurrence of data within the first database 
corresponding to the or each entry in the dictionary, 
determined by the processor as corresponding to 
the one or more input terms, and determining from 
the locations so ascertained the postal address or 
addresses being in accordance with the input data. 

10 A method according to any preceding claim, where- 
in step e) includes the processor initially searching 
the dictionary for any entry in the dictionary identical 
to the or each input term. 

11 A method according to claim 10, wherein if during 
' step e) no entry is found that is identical to any of 

the input terms the processor searches the diction- 
ary for entries having a lower quality correspond- 
ence with the or each input term. 



12 Amethodaccordingtoclaim10orclaim11,wher in 
if a plurality of input terms ar inputted and the proc- 
essorfinds during step e) one or more entries in the 
dictionary identical to at least one, but not all, of th 
s input terms, respectively, thereby leaving on or 
more unmatched input terms, then the processor 
continues searching the dictionary for entries hav- 
ing a lower quality correspondence with those un- 
matched input terms. 

10 

13 A method according to any preceding claim, where- 
in the postal address elements forming a postal ad- 
dress are notionally divided into categories and 
there is provided data enabling the processorto as- 

is certain the category of a given postal address ele- 
ment represented by data in the first database. 

14 A method according to claim 13, wherein the proc- 
essor is programmed to be able to be provided with 

20 input data including an indication of the category of 
postal address element that each of at least one of 
the input terms represents. 

1 5 A method according to any preceding claim, where- 
as in the input data received by the processor in step 

d) is processed by the processor before step e) is 
performed. 

16 A method according to claim 15, wherein the 
30 processing of the input data is for the purpose of 

reducing the likelihood of a postal address not being 
found through differences in data syntax between 
the input terms and the postal address elements 
represented by the data of either or both of the first 
35 and second databases. 

17 A method according to claim 15 or claim 16, when 
dependent on claim 14, wherein the processing of 
the input data includes ascertaining whether any of 

40 the input terms correspond to a category of postal 
address element and if so including an indication of 
the category in the input data. 



1 8 A method according to any preceding claim, where- 
45 in the input data is entered manually by a user. 

19 Amethod accordingto any of claims 1 to 17, where- 
in the input data is taken from a separate data store. 

so 20 A method of retrieving data representing a postal 
address from a database representing a multiplicity 
of postal addresses, the method comprising the fol- 
lowing steps: 

55 i) providing a processor, 

ii) providing a database, accessible by the proc- 
essor, of data representing a multiplicity of 
postal addresses, a dictionary of terms corre- 
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sponding to those found within th postal ad- 
dress s, and information nab ling the proces- 
sor to ascertain th one or more postal address- 
es in the database having a term corresponding 
to each dictionary entry, 

iii) providing the processor with input data for 
finding a postal address in the database, the 
processor then searching the dictionary for en- 
tries corresponding to the input data and ascer- 
taining from the information if any postal ad- 
dress in the database corresponds sufficiently 
closely to the input data, and 

iv) outputting data relating to the results of step 
iii). 

21. A method according to claim 20, wherein the dic- 
tionary is in the form of a tree data structure, having 
a root node and terminating in a multiplicity of 
leaves, the path from the root node to a leaf being 
representative of term within a postal address in the 
database. 

22. Apparatus for retrieving data representing a postal 
address from a database representing a multiplicity 
of postal addresses, 

the apparatus including a computer proces- 
sor, 

the apparatus being provided with 

a first database, accessible by the processor, 
comprising data representing a multiplicity of postal 
addresses, each postal address being formed of 
one or more postal address elements, and 

a dictionary in the form of a second database, 
accessible by the processor, comprising data rep- 
resenting entries, each entry corresponding to at 
least one postal address element represented by 
the data of the first database, wherein 

the dictionary is in the form of a tree data 
structure, having a root node and terminating in a 
multiplicity of leaves, the path from the root node to 
a leaf being representative of an element of a postal 
address, and 

the processor is programmed to be able 

to receive input data comprising an input term 
for finding a postal address represented in the first 
database, 

to search the dictionary for entries in the dic- 
tionary corresponding to an input term, 

to ascertain information concerning data in 
the first database representing the or each element 
corresponding tp the or each entry in the dictionary 
determined by the processor as corresponding to 
an input term, and 

to output data representing the or each postal 
address, if any, represented by the first database 
determined by the processor as being in accord- 
ance with the input data. 



23. Apparatus as claimed in claim 22, the apparatus be- 
ing arranged to be abl to perform a method accord- 
ing to any of claims 2 to 1 9, 

5 24. Apparatus for retrieving data repr senting a postal 
address from a database representing a multiplicity 
of postal addresses, the apparatus including 

a computer processor and one or more data- 
bases, accessible by the processor, of data repre- 
10 senting a multiplicity of postal addresses, a diction- 
ary of terms found within the postal addresses, and 
information enabling the processor to link a given 
entry in the dictionary with the one or more postal 
addresses in the database having a term corre- 
15 sponding to the dictionary entry, 

the processor being programmed to be able 
to receive input data for finding a postal address in 
the database, to search the dictionary for entries 
corresponding to input data, to ascertain if any post- 
20 al address in the database corresponds sufficiently 
closely to the input data, and to send output data 
relating to one or more postal addresses in the da- 
tabase. 

25 25. Apparatus as claimed in claim 24, arranged to be 
able to perform a method according to claim 21 . 

26. Computer program product executable in a proces- 
sor^ perform a method as claimed in any of claims 

30 1to21. 

27. Computer program product as claimed in claim 26 
together with a data product, the data product ena- 
bling a processor once programmed with the com- 

35 puter program product to perform the method of any 
of claims 1 to 21 . 

28. Data product, accessible by a computer processor, 
the data product including 

40 data representing a multiplicity of postal ad- 

dresses, each postal address being formed of one 
or more postal address elements, 

a dictionary comprising data representing en- 
tries, each entry corresponding to at least one post- 
45 al address element represented by data in the data 
product, wherein the dictionary is in the form of a 
tree data structure, having a root node and termi- 
nating in a multiplicity of leaves, the path from the 
root node to a leaf being representative of an ele- 
50 mentof a postal address, and 

data linking a given entry in the dictionary with 
the one or more postal addresses in the data prod- 
uct having a term corresponding to the dictionary 
entry. 

55 

29. Data product, accessible by a computer processor, 
the data product including 

a first data structure representing a multiplicity 
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of postal addr sses, each postal addr ss being 
formed of one or more postal address elements, the 
first data structure comprising respective codes 
representing respectiv postal address elements, 
a second data structure, in the form of a dic- 
tionary, comprising a multiplicity of entries, each en- 
try corresponding to at least one postal address el- 
ement represented by the data in the first data struc- 
ture, 

a third data structure linking each code in the 
first data structure to data from which the postal el- 
ement represented by the code can be directly as- 
certained, and 

a fourth data structure comprising data linking 
a given entry in the second data structure with each 
item of data in the first data structure representing 
the postal address element corresponding to the 
entry in the second data structure. 
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Figure 1 
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